给定编程问题,预训练的语言模型(例如Codex)证明了通过采样生成多个不同代码解决方案的能力。但是,从这些样本中选择正确或最佳解决方案仍然是一个挑战。尽管验证代码解决方案正确性的一种简单方法是通过执行测试用例,但生产高质量的测试用例非常昂贵。在本文中,我们探讨了使用预训练的语言模型自动生成测试用例,称我们的方法Codet:使用生成测试的代码生成。 CODET使用生成的测试用例执行代码解决方案,然后根据与生成的测试用例和其他生成的解决方案的双重执行协议选择最佳解决方案。我们在五个具有HumaneVal和MBPP基准的不同预训练模型上评估Codet。广泛的实验结果表明,Codet可以实现对以前方法的显着,一致且令人惊讶的改进。例如,CODET将HOMANEVAL的通行证提高到65.8%,在Code-Davinci-002模型上的绝对增长率为18.8%,并且比以前的最新结果相比,绝对20+%提高。
translated by 谷歌翻译
代码生成是一个长期的挑战,旨在根据自然语言描述生成代码段。通常,昂贵的文本编码配对数据对于培训代码生成模型至关重要。最近,由于培训预培训技术的成功,大型语言模型接受了大规模未标记的代码语料库的培训,并在代码生成方面表现良好。在本文中,我们调查了如何利用未标记的代码语料库来训练以图书馆为导向的代码生成的模型。由于对于程序员重复使用第三方库是一种普遍的做法,因此由于库数量大量,文本编码配对数据很难获得。我们观察到面向图书馆的代码片段更有可能共享类似的代码草图。因此,我们为证书提供了两个步骤:草图器生成草图,然后发电机填充了草图中的详细信息。 Sketcher和Generator都使用未标记的数据在基本模型上不断预先训练。此外,我们制作了两个名为Pandaseval和NumpyeVal的基准,以评估面向图书馆的代码生成。实验结果证明了CERT的表现令人印象深刻。例如,它超过了基本模型,在pandaseval上的Pass@1方面,绝对提高了15.67%。我们的工作可在https://github.com/microsoft/pycodegpt上获得。
translated by 谷歌翻译
GPT-3和Palm等大型语言模型在几次学习中表现出色。但是,他们仍然在推理任务(例如算术基准GSM8K)上挣扎。最近的进步故意指导语言模型在产生最终答案之前生成一系列推理步骤,从而成功地将GSM8K基准从17.9%提高到58.1%,以解决问题的解决率。在本文中,我们提出了一种新的方法,即多样化的方法(关于推理步骤的多样化验证者),以进一步提高其推理能力。多样性首先探索不同的提示,以增强推理路径的多样性。其次,Diverse介绍了一个验证者,以区分好的答案和不良答案,从而获得更好的权重投票。最后,多样性验证每个步骤的正确性,而不是整体上的所有步骤。我们使用最新的语言型号Davinci-002进行广泛的实验,并证明多样化可以在八分之六的推理基准中实现新的最先进的性能(例如,GSM8K 74.4%至83.2%),超过棕榈具有540B参数的模型。
translated by 谷歌翻译
最近的语言模型预培训进展取得了巨大的成功,通过利用大规模的非结构化文本数据。然而,由于没有大规模的高质量表格数据,在结构化的表格数据上应用预先培训仍然是一项挑战。在本文中,我们提出了Tapex,以表明通过在合成语料库上学习神经SQL执行程序来实现表预培训,这是通过自动合成可执行的SQL查询和执行输出来获得的。 Tapex通过引导语言模型来模仿SQL执行程序的不同,大规模和高质量的合成语料库来解决数据稀缺性挑战。我们在四个基准数据集中评估Tapex。实验结果表明,Tapex优于以前的表格预训练,并通过大幅度达到了新的最先进的结果。这包括改进弱监管的WikiSQL表示精度为89.5%(+ 2.3%),WikityQuestions表示精度为57.5%(+ 4.8%),SQA表示精度为74.5%(+ 3.5%)和Tabfact精度84.2%(+ 3.2%)。为了我们的知识,这是通过合成可执行程序利用表预培训的第一项工作,并在各种下游任务上实现新的最先进结果。
translated by 谷歌翻译
We propose "factor matting", an alternative formulation of the video matting problem in terms of counterfactual video synthesis that is better suited for re-composition tasks. The goal of factor matting is to separate the contents of video into independent components, each visualizing a counterfactual version of the scene where contents of other components have been removed. We show that factor matting maps well to a more general Bayesian framing of the matting problem that accounts for complex conditional interactions between layers. Based on this observation, we present a method for solving the factor matting problem that produces useful decompositions even for video with complex cross-layer interactions like splashes, shadows, and reflections. Our method is trained per-video and requires neither pre-training on external large datasets, nor knowledge about the 3D structure of the scene. We conduct extensive experiments, and show that our method not only can disentangle scenes with complex interactions, but also outperforms top methods on existing tasks such as classical video matting and background subtraction. In addition, we demonstrate the benefits of our approach on a range of downstream tasks. Please refer to our project webpage for more details: https://factormatte.github.io
translated by 谷歌翻译
我们解决了将一块类似面团的变形材料塑造成预先提出的2D目标形状的问题。我们使用配备有滚动销和从RGB-D摄像头和触觉传感器收集的信息的6自由寡妇250机器人臂。我们介绍并比较了几种控制策略,包括面团收缩的作用,在三种可变形材料以及三种目标面团形状大小的广泛实验中,达到了0.90的联合(IOU)的交集。我们的结果表明:i)从最高面团的滚动面团比2D/3D面团质心更有效; ii)最好停止在当前面团边界的滚动运动,而不是目标形状轮廓; iii)只有在适当地调整外观动作时,收缩作用才可能有益; iv)与塑料或动态砂相比,Play-DOH材料更容易形成目标形状。我们的工作的视频演示可在https://youtu.be/zzlmxuitdt4上获得
translated by 谷歌翻译
嵌套命名实体识别(Nested Ner)是自然语言处理中的基本任务。已经提出了各种基于跨度的方法来检测具有跨度表示的嵌套实体。但是,基于跨度的方法不考虑跨度与其他实体或短语之间的关系,这对NER任务很有帮助。此外,由于跨度枚举长度有限,基于跨度的方法在预测长实体方面难以预测。为了减轻这些问题,我们介绍了提出的和refine网络(PNRNET),这是一个嵌套NER的两阶段集预测网络。在建议阶段,我们使用基于跨度的预测指标来生成一些粗糙的实体预测作为实体建议。在精炼阶段,建议相互互动,并将更丰富的上下文信息纳入建议表示。精致的建议表示形式用于重新预测实体边界和类。这样,可以消除粗略建议中的错误,并且边界预测不再受到跨度枚举长度限制的约束。此外,我们构建了多尺度句子表示,它可以更好地对句子的层次结构进行建模,并提供比令牌级表示更丰富的上下文信息。实验表明,PNRNET在四个嵌套的NER数据集和一个Flat NER数据集上实现了最先进的性能。
translated by 谷歌翻译
Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译
Image Virtual try-on aims at replacing the cloth on a personal image with a garment image (in-shop clothes), which has attracted increasing attention from the multimedia and computer vision communities. Prior methods successfully preserve the character of clothing images, however, occlusion remains a pernicious effect for realistic virtual try-on. In this work, we first present a comprehensive analysis of the occlusions and categorize them into two aspects: i) Inherent-Occlusion: the ghost of the former cloth still exists in the try-on image; ii) Acquired-Occlusion: the target cloth warps to the unreasonable body part. Based on the in-depth analysis, we find that the occlusions can be simulated by a novel semantically-guided mixup module, which can generate semantic-specific occluded images that work together with the try-on images to facilitate training a de-occlusion try-on (DOC-VTON) framework. Specifically, DOC-VTON first conducts a sharpened semantic parsing on the try-on person. Aided by semantics guidance and pose prior, various complexities of texture are selectively blending with human parts in a copy-and-paste manner. Then, the Generative Module (GM) is utilized to take charge of synthesizing the final try-on image and learning to de-occlusion jointly. In comparison to the state-of-the-art methods, DOC-VTON achieves better perceptual quality by reducing occlusion effects.
translated by 谷歌翻译